In this notebook we are comparing the use of Alevin-fry and Spaceranger for quantifying spatial transcriptomics libraries. Two spatial transcriptomic libraries were quantified using Alevin-fry and Spaceranger and the results were combined following the Alevin-fry tutorial. We are comparing that to if we were to not integrate the Spaceranger data with Alevin-fry and only use Spaceranger. When performing this analysis all tools used an index with Ensembl 104. Here we will look at two libraries, SCPCR000372 and SCPCR000373. Alevin-fry was run both using the knee filtering and the unfiltered permit list mode. Included here is also use of Alevin-fry with the --sketch alignment mode and unfiltered permit list.
Note that SpatialExperiment was installed from Github, in order to reflect the most recent changes in read10XVisium at commit ddb15e0.
library(magrittr)
library(ggplot2)
library(SingleCellExperiment)
Loading required package: SummarizedExperiment
Loading required package: MatrixGenerics
Loading required package: matrixStats
Attaching package: ‘MatrixGenerics’
The following objects are masked from ‘package:matrixStats’:
colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, colCounts, colCummaxs,
colCummins, colCumprods, colCumsums, colDiffs, colIQRDiffs, colIQRs, colLogSumExps,
colMadDiffs, colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, colProds,
colQuantiles, colRanges, colRanks, colSdDiffs, colSds, colSums2, colTabulates,
colVarDiffs, colVars, colWeightedMads, colWeightedMeans, colWeightedMedians,
colWeightedSds, colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, rowCumsums, rowDiffs,
rowIQRDiffs, rowIQRs, rowLogSumExps, rowMadDiffs, rowMads, rowMaxs, rowMeans2,
rowMedians, rowMins, rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, rowWeightedMads,
rowWeightedMeans, rowWeightedMedians, rowWeightedSds, rowWeightedVars
Loading required package: GenomicRanges
package ‘GenomicRanges’ was built under R version 4.1.2Loading required package: stats4
Loading required package: BiocGenerics
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call,
duplicated, eval, evalq, Filter, Find, get, grep, grepl, intersect, is.unsorted, lapply,
Map, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit,
which.max, which.min
Loading required package: S4Vectors
package ‘S4Vectors’ was built under R version 4.1.2
Attaching package: ‘S4Vectors’
The following objects are masked from ‘package:base’:
expand.grid, I, unname
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite
Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: ‘Biobase’
The following object is masked from ‘package:MatrixGenerics’:
rowMedians
The following objects are masked from ‘package:matrixStats’:
anyMissing, rowMedians
library(SpatialExperiment)
library(ggupset)
library(gridExtra)
Attaching package: ‘gridExtra’
The following object is masked from ‘package:Biobase’:
combine
The following object is masked from ‘package:BiocGenerics’:
combine
library(ggrepel)
library(clusterProfiler)
Registered S3 method overwritten by 'data.table':
method from
print.data.table
clusterProfiler v4.2.0 For help: https://yulab-smu.top/biomedical-knowledge-mining-book/
If you use clusterProfiler in published research, please cite:
T Wu, E Hu, S Xu, M Chen, P Guo, Z Dai, T Feng, L Zhou, W Tang, L Zhan, X Fu, S Liu, X Bo, and G Yu. clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. The Innovation. 2021, 2(3):100141
Attaching package: ‘clusterProfiler’
The following object is masked from ‘package:IRanges’:
slice
The following object is masked from ‘package:S4Vectors’:
rename
The following object is masked from ‘package:stats’:
filter
library(org.Hs.eg.db)
Loading required package: AnnotationDbi
package ‘AnnotationDbi’ was built under R version 4.1.2
Attaching package: ‘AnnotationDbi’
The following object is masked from ‘package:clusterProfiler’:
select
# set seed for ORA
set.seed(2021)
# load in benchmarking functions that will be used for copying data and generating sample tables
function_path <- file.path(".." ,"benchmarking-functions", "R")
file.path(function_path, list.files(function_path, pattern = "*.R$")) %>%
purrr::walk(source)
# set up file paths
base_dir <- here::here()
# folder with alevin-fry and cellranger quants from S3
data_dir <- file.path(base_dir, "data", "spatial")
quants_dir <- file.path(data_dir, "data", "quants")
# results directory
results_dir <- file.path(data_dir, "results")
# sample name
sample_ids <- c("SCPCR000372", "SCPCR000373")
mito_file <- file.path(base_dir, "sample-info", "Homo_sapiens.GRCh38.104.mitogenes.txt")
# read in mito genes
mito_genes <- readr::read_tsv(mito_file, col_names = "gene_id")
Rows: 37 Columns: 1
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: "\t"
chr (1): gene_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
mito_genes <- mito_genes %>%
dplyr::pull(gene_id) %>%
unique()
# download alevin fry and cellranger output
aws_copy_samples(local_dir = quants_dir,
s3_dir = "s3://nextflow-ccdl-results/scpca",
samples = sample_ids,
tools = c("alevin-fry-knee", "alevin-fry-unfiltered", "cellranger"))
Now let’s take a look at comparing the two methods of using Alevin-fry + Spaceranger to only Spaceranger for quantification. To do this, we will read in the Alevin-fry + Spaceranger combined and Spaceranger only SpatialExperiment objects separately and then merge them into one list before grabbing the per cell and per gene quality metrics.
# get path to fry knee output directory
fry_knee_dir <- file.path(quants_dir, "alevin-fry-knee", sample_ids)
fry_knee_dir <- paste0(fry_knee_dir, "-Homo_sapiens.GRCh38.104.spliced_intron.txome-salign-cr-like-em-knee")
# get path to fry unfiltered output directory
fry_unfiltered_dir <- file.path(quants_dir, "alevin-fry-unfiltered", sample_ids)
fry_unfiltered_dir <- paste0(fry_unfiltered_dir, "-Homo_sapiens.GRCh38.104.spliced_intron.txome-salign-cr-like-em")
# get path to fry unfiltered sketch output directory
fry_sketch_dir <- file.path(quants_dir, "alevin-fry-unfiltered", sample_ids)
fry_sketch_dir <- paste0(fry_sketch_dir, "-Homo_sapiens.GRCh38.104.spliced_intron.txome-sketch-cr-like-em")
# paths to spatial folders
cellranger_folders <- paste0(sample_ids, "-GRCh38_104_cellranger_full-spatial")
spaceranger_dir <- file.path(quants_dir, "cellranger", cellranger_folders)
# read in combined fry and spaceranger spe for fry knee
fry_knee_spe_1 <- create_fry_spaceranger_spe(fry_knee_dir[1],
spaceranger_dir[1],
sample_ids[1])
Rows: 4992 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
fry_knee_spe_2 <- create_fry_spaceranger_spe(fry_knee_dir[2],
spaceranger_dir[2],
sample_ids[2])
Rows: 4992 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# read in combined fry and spaceranger spe for fry unfiltered
fry_unfiltered_spe_1 <- create_fry_spaceranger_spe(fry_unfiltered_dir[1],
spaceranger_dir[1],
sample_ids[1])
Rows: 4992 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
fry_unfiltered_spe_2 <- create_fry_spaceranger_spe(fry_unfiltered_dir[2],
spaceranger_dir[2],
sample_ids[2])
Rows: 4992 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# read in combined fry and spaceranger spe for fry knee
fry_sketch_spe_1 <- create_fry_spaceranger_spe(fry_sketch_dir[1],
spaceranger_dir[1],
sample_ids[1])
Rows: 4992 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
fry_sketch_spe_2 <- create_fry_spaceranger_spe(fry_sketch_dir[2],
spaceranger_dir[2],
sample_ids[2])
Rows: 4992 Columns: 6
── Column specification ────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (1): barcode
dbl (5): in_tissue, array_row, array_col, pxl_row_in_fullres, pxl_col_in_fullres
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Joining, by = "barcode"
# spaceranger output paths
spaceranger_dir <- file.path(quants_dir, "cellranger", cellranger_folders)
# read in spaceranger output directly using read10XVisium
spaceranger_spe_1 <- read10xVisium(file.path(spaceranger_dir[1], "outs"), sample_id = sample_ids[1])
spaceranger_spe_2 <- read10xVisium(file.path(spaceranger_dir[2], "outs"), sample_id = sample_ids[2])
Now that we have read in the data and created our two SpatialExperiment objects, we can go ahead and combine them into one list and then calculate the per spot QC metrics using scuttle::addPerCellQCMetrics().
# create one list with both spe's together
all_spe_list <- list(fry_knee_spe_1, fry_unfiltered_spe_1,
fry_sketch_spe_1, spaceranger_spe_1,
fry_knee_spe_2, fry_unfiltered_spe_2,
fry_sketch_spe_2, spaceranger_spe_2)
# name each spe with combination of sample_id-tool
spe_names <- c("SCPCR000372-alevin-fry-knee", "SCPCR000372-alevin-fry-unfiltered",
"SCPCR000372-alevin-fry-unfiltered-sketch", "SCPCR000372-spaceranger",
"SCPCR000373-alevin-fry-knee", "SCPCR000373-alevin-fry-unfiltered",
"SCPCR000373-alevin-fry-unfiltered-sketch", "SCPCR000373-spaceranger")
names(all_spe_list) <- spe_names
# calculate per cell QC and output to a combined data frame with plotting
all_spe_list <- all_spe_list %>%
purrr::map(
~ scuttle::addPerCellQCMetrics(.x,
subsets = list(mito = mito_genes[mito_genes %in% rownames(.x)])))
After adding in the per spot QC metrics to both of the spe’s, we want to extract the colData from each spe and create a data frame that we can use for plotting. We will also need some information about each sample and how it was run, so we will create a sample metadata table, sample_info_df that will then be merged with the colData.
# create sample info dataframe to be joined with per spot dataframe later
sample_info_df <- quant_info_table(data_dir= quants_dir,
tools = c("cellranger", "alevin-fry-knee", "alevin-fry-unfiltered"),
samples = sample_ids) %>%
# convert cellranger to spaceranger and paste filtering strategy to alevin-fry
dplyr::mutate(tool = ifelse(tool == "cellranger", "spaceranger", paste(tool, filter_strategy, sep = "-")),
tool = dplyr::case_when(alevin_alignment == "sketch" ~ paste(tool, alevin_alignment, sep = "-"),
alevin_alignment != "sketch" ~ tool))
sample_info_df
When we convert the colData to a data frame we use the custom function, spatial_coldata_to_df() to do so and apply it to each spe in our list.
fry_knee_names <- c("SCPCR000372-alevin-fry-knee", "SCPCR000373-alevin-fry-knee")
fry_unfiltered_names <- c("SCPCR000372-alevin-fry-unfiltered", "SCPCR000373-alevin-fry-unfiltered")
fry_unfiltered_sketch_names <- c("SCPCR000372-alevin-fry-unfiltered-sketch", "SCPCR000373-alevin-fry-unfiltered-sketch")
spaceranger_names <- c("SCPCR000372-spaceranger", "SCPCR000373-spaceranger")
# join coldata dataframe with sample info
coldata_df <- all_spe_list %>%
purrr::map_df(spatial_coldata_to_df, .id = "tool") %>%
# remove extra -1 from spaceranger barcodes
dplyr::mutate(spot_id = gsub("-1", "", spot_id),
# remove tool from sample id
sample_id = stringr::word(sample_id, 1, sep = "-"),
# remove sample id from tool
tool = dplyr::case_when(tool %in% fry_knee_names ~ "alevin-fry-knee",
tool %in% fry_unfiltered_names ~ "alevin-fry-unfiltered",
tool %in% fry_unfiltered_sketch_names ~ "alevin-fry-unfiltered-sketch",
tool %in% spaceranger_names ~ "spaceranger")) %>%
dplyr::left_join(sample_info_df,
by = c("tool", "sample_id" = "sample")) %>%
# remove spots that are not overlapping tissue
dplyr::filter(in_tissue == 1)
Now we only want to filter our data frame to contain spots that are shared between both tools.
# identify shared spots only
spot_counts <- coldata_df %>%
dplyr::count(spot_id, sample_id)
# how many spots are shared among the tools
spot_counts_plot <- coldata_df %>%
dplyr::group_by(spot_id, sample_id) %>%
dplyr::summarise(tools_detected = list(unique(tool)))
`summarise()` has grouped output by 'spot_id'. You can override using the `.groups` argument.
ggplot(spot_counts_plot, aes(x = tools_detected))+
geom_bar() +
scale_x_upset(n_intersections = 4)
For the most part, the majority of the spots identified are found in both Spaceranger alone and the combination with Alevin-fry-knee and Alevin-fry-unfiltered, with a small subset being identified in Spaceranger and Alevin-fry-unfiltered. It appears that using alevin-fry-unfiltered does give us some spots that using the knee method does not give us and we don’t see any loss of spots.
Let’s filter to only include these common spots.
common_spots <- spot_counts %>%
dplyr::filter(n == 4) %>%
dplyr::pull(spot_id)
coldata_df_common <- coldata_df %>%
dplyr::filter(spot_id %in% common_spots)
We will also need to filter the spe’s directly based on spots that are present in the tissue, so we create a small function to do this and then apply it to both spe’s in the list.
# we will also want to filter the spe's directly
filter_spe <- function(spe){
spe <- spe[, spatialData(spe)$in_tissue == 1]
}
all_spe_filter <- all_spe_list %>%
purrr::map(filter_spe)
When we look at our results, we will also want to visualize them so we will make a custom function to plot the results.
# custom function for plotting spe results and coloring by column of colData of choice
plot_spe <- function(spe, sample, column){
# plot spots only
p1 <- ggspavis::plotSpots(spe,
x_coord = "pxl_col_in_fullres",
y_coord = "pxl_row_in_fullres",
annotate = column) +
scale_color_viridis_c()
# plot with tissue underneath
p2 <- ggspavis::plotVisium(spe,
x_coord = "pxl_col_in_fullres",
y_coord = "pxl_row_in_fullres",
fill = column) +
scale_fill_viridis_c()
# arrange plots and add sample name as title
grid.arrange(p1, p2, nrow = 1, top = grid::textGrob(sample))
}
First we will look at the per cell metrics: mitochondrial reads per cell, total UMI per cell, and total genes detected per cell.
# % mitochondrial reads/ spot
ggplot(coldata_df_common, aes(x = tool, y = subsets_mito_percent, fill = tool)) +
geom_boxplot() +
facet_wrap(~ sample_id) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("Mito Percent") +
xlab("")
all_spe_filter %>%
purrr::iwalk(plot_spe, column = "subsets_mito_percent")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing
scale.
Overall it looks like mitochondrial content is low and fairly similar across both tools.
# total UMI/ spot
ggplot(coldata_df_common, aes(x = sum, color = tool)) +
geom_density() +
facet_wrap(~ sample_id) +
theme_classic() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
ylab("UMI/spot") +
xlab("")
all_spe_filter %>%
purrr::iwalk(plot_spe, column = "sum")
Scale for 'colour' is already present. Adding another scale for 'colour', which will replace the
existing scale.
Scale for 'fill' is already present. Adding another scale for 'fill', which will replace the existing
scale.